Genome sequencing: mind the gap

نویسنده

  • Jay Withgott
چکیده

The year 2000 is poised to be declared the ‘Year of the Genome’. The sequences of human chromosomes 21 and 22 have been published in the past six months, and two further landmark announcements are expected within days — indeed, may even have been made by the time you read this. Scientists of the Human Genome Project are on the verge of announcing completion of a “rough draft” of the human genome. And the world should soon hear another stunning proclamation from Celera Genomics, which recently completed a genome draft and is now assembling its data into near-final form. But amid all the bluster about who will deliver the sequence first, there’s been little attention paid to the quality and completeness of the data we’ll have after the sensational announcements are made and the hoopla’s died down. What, for instance, is meant by a ‘rough draft’? The rough draft from the Human Genome Project — an international consortium of labs funded primarily by the US government and the UK’s Wellcome Trust — will sequence 90% of the roughly 3.2 billion bases of the human genome. Celera says its soon-to-be-finished product (for which it is also using the public-domain data) will be 99% complete. The criteria for completion of these stages are arbitrary. Finishing the job will be the hardest part, a matter of diminishing returns, and 1–3 years may be needed to tackle those remaining percentage points by closing the thousands of gaps in the sequence. Which raises the question: Are the not-quite-finished data we’re about to see reliable, and useful for biomedical research? Or are the numerous gaps in the sequence a tiny problem with huge effects? In answering these questions, it’s important to know why the gaps exist at all. There are two types of gap: one is a statistical artifact, the other is biologically meaningful. Statistical gaps are a byproduct of the ‘shotgunning’ sequencing methods used by both the public and private efforts. DNA is chopped up into small pieces randomly, and the pieces are sequenced enough times to cover any particular point on the genome several times over, on average. Computer programs then search for overlap among the sequences and connect and order the fragments into a linear whole. By chance, some stretches will be covered many times over, whereas others will be missed completely; the more sequencing, the fewer gaps will remain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genome Wide Association Studies, Next Generation Sequencing and Their Application in Animal Breeding and Genetics: A Review

Recently genetic studies have been revolutionized by next generation sequencing (NGS) technology, and it is expected that the use of this technology will largely eliminate defects in the methods of association studies. The NGS technology is becoming the premier tool in genetics. However, at the moment the use of this method is limited especially in the livestock due to high cost and computation...

متن کامل

Transcriptome Sequencing of Guilan Native Cow in Comparison with bosTau4 Reference Genome

RNA-sequencing is a new method of transcriptome characterization of organisms. Based on identity and relatedness, there are large genetic variations among different cattle breeds. The goal of the current study was to sequence the transcriptome of Guilan native cow and compare with available reference genome using RNA-sequencing method. Blood samples were collected from 14 Guilan native cows and...

متن کامل

I-37: Establishing High Resolution Genomic Profiles of Single Cells Using Microarray and Next-Generation Sequencing Technologies

The nature and pace of genome mutation is largely unknown. Standard methods to investigate DNA-mutation rely on arraying or sequencing DNA from a population of cells, hence the genetic composition of individual cells is lost and de novo mutation in cell(s) is concealed within the bulk signal. We developed methods based on (SNP-) arraying and next-generation sequencing of single-cell whole-genom...

متن کامل

Sequencing and Molecular Analysis of ATP 6 and ATP 8 of Mitochondrial Genome in Khorasanian Native Chickens

In order to perform breeding programs and improve production of native chickens, preserving genetic diversity in different areas of Iran is important due to the reduced available population. Genome sequencing is considered the most functional approach to determine the phylogeny relation between close populations. The aim of the present study was the evaluation of the phylogeny and genetic nucle...

متن کامل

Whole-Genome Sequencing of a Clinically Isolated Antibiotic-Resistant Enterococcus faecium EntfacYE

Background and Objective: Enterococcal infections are considered the most common nosocomial infections. Nowadays, enterococci show high resistance to common antibiotics, especially vancomycin. Vancomycin-resistant Enterococcus faecium is one of the most common nosocomial infections, which is included in the World Health Organization priority pathogens list for research and development of new an...

متن کامل

On Health Policy and Management (HPAM): Mind the Theory-Policy-Practice Gap

We argue that the field of Health Policy and Management (HPAM) ought to confront the gap between theory, policy, and practice. Although there are perennial efforts to reform healthcare systems, the conceptual barriers are considerable and reflect the theory-policy-practice gap. We highlight four dimensions of the gap: 1) the dominance of microeconomic thinking in health policy analysis and desi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Current Biology

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2000